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ABSTRACT 

The new generation of radio synthesis arrays, such as LOFAR and SKA, have been designed to 
surpass existing arrays in terms of sensitivity, angular re solution and frequency coverage. This 
evolution has led to the development of advanced calibration techniques that ensure the deliv- 
ery of accurate results at the lowest possible computational cost. However, the performance 
of such calibration techniques is still limited by the compact, bright sources in the sky, used 
as calibrators. It is important to have a bright enough source that is well distinguished from 
the background noise level in order to achieve satisfactory results in calibration. This paper 
presents "clustered calibration" as a modification to traditional radio interferometric calibra- 
tion, in order to accommodate faint sources that are almost below the background noise level 
into the calibration process. The main idea is to employ the information of the bright sources' 
measured signals as an aid to calibrate fainter sources that are nearby the bright sources. In the 
case where we do not have bright enough sources, a source cluster could act as a bright source 
that can be distinguished from background noise. For this purpose, we construct a number 
of source clusters assuming that the signals of the sources belonging to a single cluster are 
corrupted by almost the same errors. Under this assumption, each cluster is calibrated as a 
single source, using the combined coherencies of its sources simultaneously. This upgrades 
the power of an individual faint source by the effective power of its cluster The solutions thus 
obtained for every cluster are assigned to each individual source in the cluster We give per- 
formance analysis of clustered calibration to show the superiority of this approach compared 
to the traditional un-clustered calibration. We also provide analytical criteria to choose the 
optimum number of clusters for a given observation in an efficient manner 

Key words: methods: statistical, methods: numerical, techniques: interferometric 
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1 INTRODUCTION 

Low frequency radio astronomy is undergoing a revolution as a 
new generation of radio interferometers with increased sensitiv- 
ity and resolution, such as the LOw Frequency ARray (LOFARfl 
the Murchison Wide-field Array (MWA)I3 and the Square Kilome- 
ter Array (SKAf| are being devised and some are already becom- 
ing operational. These arrays form a large effective aperture by the 
co mbination of a large num ber of antennas using aperture synthe- 
sis jxhompson et al. boOlh . In order to achieve the full potential 
of such an interferometer, it is essential that the observed data is 
properly calibrated before any imaging is done. Calibration of ra- 
dio interferometers refers to the estimation and reduction of errors 
introduced by the atmosphere and also by the receiver hardware, 
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before imaging. We also consider the removal of bright sources 
from the observed data part of calibration, that enable imaging the 
weak background sources. For low frequency observations with a 
wide field of view, calibration needs to be done along multiple di- 
rections in the sky. Proper calibration across the full field of view is 
required to achieve the interferometer's desired precision and sen- 
sitivity giving us high dynamic range images. 

Early radio astronomy used external (classical) calibration 
which estimates the instrumental unknown parameters using a ra- 
dio source with known properties. This method has a relatively 
low computational cost and a fast execution time, but it generates 
results strongly affected by the accuracy with which the source 
properties are known. In addition, existence of an isolated bright 
source, which is the most important requirement of external cal- 
ibration, in a very wide field of view is almost impractical, and 
even when it does, external calibration would only give informa- 
tion along the direction of the calibra tor. The external calibratio n 
was then improved by self-calibration dPearson & Readheadll984t) . 
Self-calibration has the advantage of estimating both the source 
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and instrumental unknowns, without thie need of hiaving a prior 
knowledge of the sky, only utilizing the instrument's observed data. 
Moreover, it can achieve a superior precision by iterating between 
the sky and the instrumental parameters. 

The accuracy of any calibration scheme, regardless of the 
used technique, is determined by the level of Signal to Noise Ra- 
tio (SNR). This limits the feasibility of any calibration scheme 
to only bright sources that have a high enough SNR to be distin- 
guished from the backg round noise d Semardi et al.llzoiot iLiu et al.l 
l2009l ; |Pindoretal.l2010h . Note that interferometric images are pro- 
duced using the data observed during the total observation (inte- 
gration) time. However, calibration is done at shorter time inter- 
vals of that total duration. This increases the noise level of the data 
for which calibration is executed compared to the one in the im- 
ages. In other words, there are some faint sources that appear well 
above the noise in the images while they are well below the noise 
level during calibration. It has been a great challenge to calibrate 
such faint sources having a very low S NR. In th is paper we present 
clustered self-calibration, introduced in lKazem i et al. (2011a), and 
emphasize its better performance compared to un-clustered calibra- 
tion below the ca libration noise level. Existing un-clu stered calibra- 
tion approaches Jlntema et al.ll200^ ISmimov||201 ll) can only han- 
dle a handful of directions where strong enough sources are present 
and we improve this, in particular for subtraction of thousands of 
sources over hundreds o f directions in the sky, as in the case for 
LOFAR ( lBregmanll2012h . 

The implementation of clustered calibration is performed by 
clustering the sources in the sky, assuming that all the sources in a 
single cluster have the same corruptions, and calibrating each clus- 
ter as a single source. At the end, the obtained calibration solutions 
for every source cluster is assigned to all the sources in that cluster. 
This procedure improves the information used by calibration by in- 
coiporating the total of signals observed at each cluster instead of 
each individual source's signal. Therefore, when SNR is very low, it 
provides a considerably better result compared to un-clustered cal- 
ibration. Moreover, calibrating for a set of source clusters instead 
of for all the individual sources in the sky reduces the number of 
directions along which calibration has to be performed, thus con- 
siderably cutting down the computational cost. However, there is 
one drawback in clustered calibration: The corruptions of each in- 
dividual source in one given cluster will almost surely be slightly 
different from the corruptions estimated for the whole cluster by 
calibration. We call this additional error as "clustering eiTor" and 
in order to get the best performance in clustered calibration, we 
should find the right balance between the improvement in SNR as 
opposed to degradation by clustering error 

Recently, clustering me thods have gained a lot of popu larity in 
dealing with large data sets jKaufman & Rousseeuw|[l990l) . There 
are various clustering approaches that could be applied to calibra- 
tion based on their specific ch aracteristics. An overvi ew of different 
clustering methods is given in lXu & Wunscl] ( |2008|) . Clustering of 
radio sources should take into account (i) their physical distance 
from each other and (ii) their individual intensity. The smaller the 
angular separations of sources are, the higher the likelihood that 
they share the same corruptions in their radiated signals. More- 
over, in order to get the best accuracy in the calibration results, 
there should be a balance between effective intensities of different 
clusters. Thus, in the clustering procedure, every source should be 
weighted suitably according to its brightness intensity. 

The brightness distribution of radio source in the sky is a 
power law and the spatial distribution is Poisson. Therefore, clus- 
tering the sources via probabilistic clustering approaches is com- 



putationally complex. Hierarchical clustering djohnsonll 19671) is a 
well-known clustering approach with a straight forward implemen- 
tation suitable for our case. But, its computational cost grows ex- 
ponentially with the size of its target data set which can be a dis- 
advantage when the number of sources is huge. Weighted K-means 
clustering (Kerdprasop et al. 2005; MacOueen 1967) is also one of 
the most used of clustering schemes applicable in clustered calibra- 
tion. The advantage of this clustering technique is its low computa- 
tional cost, which is proportional to the number of clusters and the 
size of the target data set. Nonetheless, we emphasize that the com- 
putational time taken by any of the aforementioned clustering al- 
gorithms is negligible compared with the computational time taken 
by the actual calibration. Therefore, we pursue all clustering ap- 
proach es in this p aper. However, the use of Fuzzy C-means cluster- 
ing ( Bezdek 1981) in clustered calibration requires major changes 
in the calibration data model and will be explored in future work. 

This paper is organized as follows: First, in section |2] we 
present the general data model used in clustered calibration. In sec- 
tion [3] we present modified weighted K-means and divisive hier- 
archical clustering for clustering sources in the sky. Next, in sec- 
tion |4l we focus on analyzing the performance of clustered cali- 
bration, with an a priori clustered sky model, and compare the re- 
sults with un-clustered calibration. In clustered calibration, there is 
contention between the improvement of SNR by clustering sources 
and the errors introduced by the clustering of sources itself. Thus, 
we relate the clustered calibration's performance to the effective 
Signal to Interference plus Noise Ratio (SINR) obtained for each 
cluster. For this purpose, we use statistica l estimati on theory and 
the Cramer-Rao Lower Bounds (CRLB) ( iKavlll993h . In sectionO 
we derive criteria for finding the optimum number of clusters for a 
given sky. We use the SINR analysis and adopt Akaike's Informa- 
tion C riterion (AIC) {Akaike 1973) and the Likelihood Ratio Test 
(LRT) jGravesiri978l) to estimate the optimum number of clusters. 
We present simulation results in section |6]to show the superiority 
of clustered calibration to un-clustered calibration and the perfor- 
mance of the presented criteria in detecting the optimum number 
of clusters. Finally, we draw our conclusions in section|7] Through 
this paper, calibration is executed by the S pace Alternating Gen- 
eralized Expectation maximization (SAGE) jFessler & Hero|[r994l : 
lYatawatta et alj|2009l : iKazemi et ai]l201 id) algorithm. Moreover, 
in our simulations, radio sources are considered to be uniformly 
distributed in the sky and their flux intensities follow Raleigh dis- 
tribution, which is the worst case scenario. In real sky models, there 
usually exist only a few (two or three) number of bright sources 
which dominate the emission. In the presence of these sources and 
the background noise, it is impractical to solve for the other faint 
sources in the field of view individually. Therefore, obtaining a bet- 
ter calibration performance via the clustered calibration approach, 
compared to the un-clustered one, is guaranteed. Therefore, in sec- 
tion [6] we illustrate this using simulations in which the brightness 
distributio n of sources is a pow e r law with a very steep slope. On 
top of that. lKazemi et alj fioH) : lYatawatta et alj fsubmitted 20I2h 
also present the performance of clustered calibration on real obser- 
vations using LOFAR. 

The following notations are used in this paper: Bold, lower- 
case letters refer to column vectors, e.g., y. Upper case bold letters 
refer to matrices, e.g., C. All parameters are complex numbers, un- 
less stated otherwise. The inverse, transpose, Hermitian transpose, 
and conjugation of a matrix are presented by (.)~^, (•)^, (O^' ™d 
(.)*, respectively. The statistical expectation operator is referred to 
as E{.}. The matrix Kronecker product and the proper (strict) sub- 
set are denoted by ® and C, respectively. !„ is the n x n identity 
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matrix and is tlie empty set. Tiie Kronecker delta function is pre- 
sented by Sij . R and C are the sets of Real and Complex numbers, 
respectively. The Frobenius norm is shown by 1 1 . 1 1 . Estimated pa- 
rameters are denoted by a hat, (.). All logarithmic calculations are 
to the base e. The multivariate Gaussian and Complex Gaussian 
distributions are denoted by N and CN, respectively. 



2 CLUSTERED SELF-CALIBRATION DATA MODEL 

In this section, we present the measurem ent equation of a po- 
larimetric clustered calibration in detail dHamakeret al.lll996l : 
lHamakej|2003) . Suppose we have a radio interferometer consist- 
ing of A'^ polarimetric antennas where each antenna is composed 
of two orthogonal dual-polarization feeds that observe K compact 
sources in the sky. Every i-th source signal, i £ {1, 2, . . . , K}, 
causes an induced voltage of Vpi = [vxpi vypi]^ at X and Y 



and frequency resolution is performed to estimate and correct for 



dipoles of every p-th antenna, p G {1,2,..., A'^}. In practice. 



(1) 



where e, = \exi evi] is the source' s electric field vecto r and 
Jpi represents the 2x2 lones matrix (' Hamaker et al.llT996l) cor- 
responding to the direction-dependent gain corruptions in the radi- 
ated signal. These corruptions are originated from the instrumental 
(the beam shape, system frequency response, etc.) and the prop- 
agation (tropospheric and ionospheric distortions, etc.) properties 
which later on, in this section, will be explained in more detail. 

The signal Vp obtained at every antenna p is a linear su- 
perposition of the K sources corrupted signals, Vpi where i £ 
{1, 2, . . . , K}, plus the antenna's thermal noise. The multitude of 
ignored fainter sources also contributes to this additive noise. 

The voltages collected at the instrument antennas get corrected 
for geometric delays, based on the location of their antennas, and 
some instrumental effects, like the antenna clock phases and elec- 
tronic gains. Then, they are correlated in the array's correlator to 
generate visibilities (Hamaker et al. 1996). The visibility matrix of 
the baseline p — q, E{vp ® v^}, is given by 



(2) 



In Eq. £ C^, P = AKN, is the unknown instrumental 
and sky parameter vector, Np, is the additive 2x2 noise matrix 
of the baseline p ~ q, and C^pgy is the Fourier tra nsform of the 
i-th source coherency matr ix d = E{ei ® ef^} l lBom & Wolj 
Il999l : iHamaker et allll996h . If the j-th source radiation intensity 
is 7i, then Ci = Considering this source to have equatorial 

coordinate, (Right Ascension a. Declination 5), equal to (oi, 
and the geometric components of baseline p — g to be (it, u, w), 
then 



where j'^ = — 1 and. 



(3) 



I — sin[ai — Qo)cos(5i), 

m — cos{5o)sin{5i) — cos{ai — ao)cos{5i)sin{5o) , 

are the source direction components corresponding to the observa- 
tion phase reference of (ao, So) (Thompson et al. 2001). The errors 
common to all directions, such as the receiver delay and amplitude 

Initial calibration at a finer time 



Gp VpqGq 



(4) 



The remaining errors are unique to a given direction, but residual 
errors in Gp-s are also absorbed into these errors, which are de- 
noted by Jpi in the usual notation. 

Vectorizing Eq. (|4j, the final visibility vector of the baseline 
p — g is given by 



Vpq = ^ J*qi{0) ® Jpi(^)vec(Ci{pq}) + Up 



(5) 



Stacking up all the cross correlations (measured visibilities) and 
noise vectors in y and n, respectively, the un-clustered self- 
calibration measurement equation is given by 



y = ^s,(e) + n. 



(6) 



In Eq. (O, y, n G C*^, M = 2N{N - 1), the noise vector is con- 
sidered to have a zero mean Gaussian distribution with covariance 
n, n ~ A/'(0, IImxm), and the nonlinear function s; (6) is defined 
as 



s,(6l) 



J2iW® JHWvec(C,{p,}) 
J|,(6I)® JH(6')vec(C,{p,}) 

_ ® J(iv-i)i(^)vec(C,{p5}) _ 



(7) 



Calibration is essentially the Maximum Likelihood (ML) esti- 
mation of the unknown parameters (P complex values or 2P real 
values), or of the Jones matrices J(^), from Eq. ^ and removal of 
the K sources. Note that calibration methods could also be applied 
to the uncorrected visibilities of (|4j to estimate Gp and errors 
as well. 

The Jones matrix Jpi, for every i-th direction and at every p-th 
antenna, is given as 



Epj 'ZpiF pi . 



(8) 



In Eq. {Sj, Epi, Zpi, and Fpi are the antenna's voltage pattern, 
ionospheric phase fluctuation, and Faraday Rotation Jones matri- 
ces, respectively. In practice, the E, Z, and F Jones matrices ob- 
tained for nearby directions and for a given antenna are almost the 
same. Thus, for every antenna p, if the i-th and j-th sources have a 
small angular separation from each other, we may consider 

Jpi = "-^PJ • C-^) 

This is the underlying assumption for clustered calibration. 

Clustered calibration first assigns source clusters, Li for i £ 
{1,2,..., Q} where Q <^ K, on which the sky variation is consid- 
ered to be uniform. Then, it assumes there exists a unique Jpi which 
is shared by all the sources of the i-th cluster Li, i £ {1,2, ... , Q}, 
at receiver p, p G {1,2,..., A^}. Based on that, the visibility at ev- 
ery baseline p — q, given by Eq. l|4), is reformulated as 



Vp, = E Jp'WiE c,{p,}}j^w +Np 



(10) 



In Eq. (Toll, N pq is the clustered calibration's effective noise at 
baseline p — g which will be explicitly discussed at section|4] Note 
that clustered calibration estimates the new unknown parameter 
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G where P — 4QN. Denoting the effective signal of ev- 
ery i-th cluster at baseline p — g by 



leLi 



(11) 



the clustered calibration visibility vector at this baseline (vectorized 
form of Eq. ^) IS 



Vpg = ^ JJi(^) ® Jpi(^)vec(Ci{p,}) + rip,. (12) 

1=1 

Finally, stacking up the visibilities of all the instrument's baselines 
in vector y, clustered calibration's general measurement equation 
is resulted as 



y = ^s,(6l)+n. 



(13) 



In Eq. ( 113b Si is defined similar to s; in Eq. Q where J and C are 
replaced by J and C, respectively. 

Because of the similarity between the clustered and the un- 
clustered calibration's measurement equations presented by Eq. 
il3i and Eq. respectively, they could utilize the same cali- 
bration techniques. Thus, the only difference between these two 
types of calibration is that clustered calibration solves for clusters 
of sources instead of for the individual ones. That upgrades the sig- 
nals which should be calibrated for from Eq. (|3) to Eq. (II 11 . 



3 CLUSTERING ALGORITHMS 

Clustering is grouping a set of data so that the members of the same 
group (cluster) have some similarities dKaufman & Rousseeuwl 
Il990l) . This similarity is defined based on the application of the 
clustering method. 

We need to define clustering schemes in which two radio 
sources merge to a single cluster based on the similarity in their 
direction dependent gain errors ( Eq. (|8] ). Radiations of sources 
that are close enough to each other in the sky are assumed to be 
affected by almost the same comiptions ( Eq. (|9) ). Based on that, 
we aim to design source clusters with small angular diameters. On 
the other hand, every cluster's intensity is the sum of the intensities 
of its members ( Eq. dllt ). In order to keep a balance between dif- 
ferent clusters' intensities, we intend to apply weighted clustering 
techniques in which the sources are weighted proportional to their 
intensities. 

Suppose that the K sources, xi,X2, ■ ■ ■ ,xk have 
{ai, 5i), {a2, 52), ■ ■ ■ , {qk ,Sk) equatorial coordinates, re- 
spectively. The aim is to provide Q clusters so that the objective 
function / = X]^=i ^(^9) minimized. D{Lq) is the angular 
diameter of cluster Lq, for q £ {1, 2, . . . , Q}, defined as 



D{Lq) = max {d{xi,Xj)\xi,Xj G Lq}, 



(14) 



and d{., .) is the angular separation between any two points on the 
celestial sphere. Having two radio sources a and b with equatorial 
coordinates {ua, Sa) and {at, 5b), respectively, the angular separa- 
tion d{a, b), in radians, is obtained by 



tan 



.1 \/cos-^(5(,sin'^ Aq + [cos5aSin(5i, — sin^aCos^tcosAa]' 



sin5asin<56 + cos5aCos(5bCosAa 
where Aa = Of, — Qa. 



(15) 



For defining the centroids, we associate a weight to every source 
Xi, as 



w{Xi 



h 

I* 



fo,ie{l,2,...,K}, 



(16) 



where U is the source's intensity and /* = min {/i, I2, ■ ■ ■ ,Ik}- 
Applying the weights to the clustering procedure, the centroids of 
the clusters lean mostly towards the brightest sources. That causes 
a tendency in faint sources to gather with brighter sources close 
to them into one cluster. Thus, their weak signals are promoted 
being added up with some brighter sources' signals. Moreover, very 
strong sources will be isolated such that their signals are calibrated 
individually, without being affected by the other faint sources. 

We cluster radio sources using weighted K-means 
dKerdprasop et al.' '20051) and divisive hierarchical clustering 
(Johnson 1967) algorithms. Since the source clustering for cal- 
ibration is performed offline, its computational complexity is 
negligible compared with the calibration procedure itself. 

3.1 Weighted K-means clustering 

Step 1. Select the Q brightest sources, x\* , X2* , ■ ■ ■ , xq* , and ini- 
tialize the centroids of Q clusters by their locations as 

cq = [aq',Sq.], fot § G { 1 , 2, . . . , Q} , G {1*,2*,...,Q*}. 

(17) 

Step 2. Assign each source to the cluster with the closest centroid, 
defining the membership function 



rriLgixi) = 



1, if d{xi, Cq) = min{d{xi, Cj)\j = 1,. . . ,Q} 



forgG {1,2,...,Q}. (18) 



0, Otherwise 

Step 3. Update the centroids by 

_ T,'t=i'rr^Lgixi) WjXi 
Y.i=xrnL^{xi) Wi 

Repeat steps 2 and 3 until there are no reassignments of sources to 
clusters. 



3.2 Divisive hierarchical clustering 

Step I. Initialize the cluster counter Q' to 1, assign all the K 
sources to a single cluster Li and to a set of null clusters A. 
Step 2. Choose cluster Lq* , for g* G {1, 2, . . . , Q'} - A, with the 
largest angular diameter 



D(V) =max{D(L,)|5 G {1,2, . . . ,Q'} - A}. 



(19) 



Step 3. Apply the presented weighted K-means clustering tech- 
nique to split Lq* into two clusters, L'q* and Lq* . 

Step 4. If D{L'q. ) + D{L'^* ) < D{Lq* ), then set Q' = Q' + 1, 
Lq* = L'q*,LQ, = Lq,,SLndA = 0, otherwise set A = Au{q*}. 
Repeat steps 2, 3, and 4 until Q' = Q. 



3.3 Comparison of clustering methods 

Hierarchical clustering method tends to design clusters with al- 
most the same angular diameters, whereas, the K-means clustering 
method tends to keep the same level of intensity at all its clusters. 
In practice, since hierarchical clustering method makes less errors 
in dedicating the same solutions to sources in small clusters, it per- 
foiTns better than Weighted K-means clustering in a clustered cali- 
bration procedure. But, when the number of sources (and clusters) 
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Figure 1. A simulated 8 by 8 degrees sky of fifty point sources witli inten- 
sities below 3 Jy. The source positions and their brightness are following 
uniform and Rayleigh distributions, respectively. The marker sizes are pro- 
portional to sources intensities. 
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Figure 2. Fifty point sources are clustered into ten source clusters by 
Weighted K-means clustering technique. There is not a good balance be- 
tween different clusters angular diameters. 



is very large (Q ^ 100), its prohibitive computational costs makes 
the fast K-means clustering method preferable. 



3.3.1 Example 1: Weighted K-means and hierarchical clustering 

We simulate an 8 by 8 degrees sky with fifty point sources with in- 
tensities below 3 Jansky (Jy). The source positions and their bright- 
ness follow uniform and Rayleigh distributions, respectively. The 
result is shown by Fig.[T]in which the symbol sizes are proportional 
to intensities of sources. Weighted K-means and divisive hierarchi- 
cal clustering methods are applied to cluster the fifty sources into 
ten source clusters. The results are presented in Fig. |2]and Fig. [3] 
respectively. Fig. |2] shows that the Weighted K-means clustering 
could design source clusters with considerably large angular diam- 
eters. Assigning the same calibration solutions to the sources of 
these large clusters could cause significant errors. However, as Fig. 
[3] shows, this is not the case for the hierarchical clustering and it 
constructs clusters with almost the same angular diameters. 

Since the number of sources in this simulation is not that large 
(K = 50), the difference between execution time of the two clus- 
tering methods is not significant. Hence in such a case, the use of hi- 
erarchical clustering method, rather than the Weighted K-means, is 
advised. However, this is not the case when we have a large number 
of sources, and subsequently a large number of source clusters, in 
the sky. To demonstrate this, we use the two clustering techniques 
for clustering thousand of sources (K = 1000) into Q source clus- 
ters, Q £ {3, 4, . . . , 100}. The methods' computational times ver- 
sus the number of clusters are plotted in Fig. |4] As Fig. |4] shows, 
for large Qs, the computational cost of Weighted K-means is much 
cheaper than the one of the hierarchical clustering. That can make 
the Weighted K-means clustering method more suitable than the 
hierarchical clustering for such a case. 



4 PERFORMANCE ANALYSIS 

In this section, we explain the reasons for clustered calibration's 
better performance, compared to un-clustered calibration, at a low 




278 276 274 272 270 
a [deg.] 

Figure 3. Fifty point sources are clustered into ten source clusters via hier- 
archical clustering method. Different clusters have almost the same angular 
diameters. 



SNR jKazemi et alJ 1201 j ). This superiority is in the sense of 
achieving an unprecedented precision in solutions with a consider- 
ably low computational complexity, given the optimum clustering 
scheme. In the next section, we present different criteria for finding 
the optimum number of clusters at which the clustered calibration 
performs the best. 

4.1 Cramer-Rao Lower Bounds 

The most fundamental assumption in clustered calibration is that 
the sources at the same cluster have exactly the same corruptions 
in their radiated signals. This assumption is of course incorrect, 
nonetheless, it provides us with a stronger signal, the sum of the 
signals in the whole cluster We present an analytic comparison of 
clustered and un-clustered calibration where we use the Cramer- 
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Figure 4. Weighted K-means and divisive hierarcliical clustering metliods 
computational costs. For small number of source clusters, there is no dif- 
ference between execution times of the two clustering methods. But, when 
the number of source clusters is large, the computational cost of Weighted 
K-means becomes much cheaper than the one of the hierarchical clustering. 



Rao Lower Bound (CRLB) tea\< ll993l) as a tool to measure the 
performance of the calibration. 

4.1.1 Estimations of CRLB for two sources at a .single cluster 

For simplicity, first consider observing two point sources at a single 
baseline, lets say baseline p — q. Based on Eq. the visibilities 
are given by 



■ JplCl{p,}Jqi + Jp2C2{pq} Jq2 



(20) 



in the un-clustered calibration strategy. Vectorizing Vpq, the visi- 
bility vector is 



y = Jg 



9l 



:(C 



+ J*2 ® Jp2vec(C2{p,}) + rip 



Assuming rip 



' CA/'(0, 0-^14), we have 



where 



s{e)= jp>(e)vec(c, 



{vq}) 



(21) 



(22) 



Using Eq. the log-likelihood function of the visibility vector 
y is given by 



my) 



-A\n{-}-o-\y-s{B))"{y-s{B)). 



(23) 



CRLB is a tight lower bound on the error variance of any un- 
biased parameter estimators (Kay 1993). Based on its definition, if 
the log-likelihood function of the random vector y, C{6\y), satis- 
fies the "regularity" condition 



my) 



■ 0, for all 9, 



then for any unbiased estimator of 0, 0, 

var(?0 > [T'\d)U, fori G {!,..., M}, 
where I{6) is the Fisher information matrix defined as 

'd^m\y) 



m) = -^y 



(24) 



(25) 



(26) 



In other words, the variance of any unbiased estimator of the un- 
known parameter vector 6 is bounded from below by the diagonal 
elements of 

Using J23b and J26b . the Fisher information matrix of the vis- 
ibility vector y is obtained as 

X{e) = 2cr~^ 5Re(jf J,), (27) 

where Jg is the Jacobian matrix of s with respect to 

2 Q 

MO) =^^{J*, ® Jp,}[l4®vec(C,;{p,})]. (28) 

i = l 

Thus, variations of any unbiased estimator of parameter vector 0, 
lets say 6, is bounded from below by the CRLB as 



Var(?) > [2a-'' 9^c(jf Js 



(29) 



Lets try to bound the error variations of the clustered calibra- 
tion parameters assuming that the two sources construct a single 
cluster, called cluster number 1. We reform Eq. (I20t as 



Vpq — Jpl(Ci{pg} 



+ C 



2{p<j} 



)Jql+ri{pg}- 



■ 2{p9} 



+Np„, (30) 



where T^p^y, referred to as the "clustering error" matrices, are 
given by 



{pi} 



T (-1 T« 



J„iC 



pi *^i{pg}*^ ql 1 



(31) 



and Jpi [0) is the clustered calibration solution at receiver p. 
Eq. OOt implies that what is considered as the noise matrix Np, : 
the clustered calibration data model, Eq. ( IIOK is in fact 



ri{pq} + r2{pq} + Np 



(32) 



Vectorizing Eq. OOt , the clustered calibration visibility vector is 
obtained by 



y = J*i ® Jpivec(Ci{pq} + C2{p5}) + np 



(33) 



where Hp, = vec(Npg). 

We point out that depending on the observation as well as the 
positions of the two sources on the sky, the clustering error Tj^pg} 
will have different properties. However, in order to study the per- 
formance of the clustered calibration in a statistical sense, and to 
simplify our analysis, we make the following assumptions. 

(i) Consider statistical expectation over different observations 
and over different sky realizations where the sources are randomly 
distributed on the sky. In that case, almost surely E{J} — >■ E{J} 
and consequently 



E{r,{p,j}^o. 



(34) 



In other words, we assume the clustering error to have zero mean 
over many observations of different parts of the sky. 

(ii) We assume that the closer the sources are together in the sky, 
the smaller the errors introduced by clustering would be. Therefore, 
given a set of sources, the clustering error will reduce as the num- 
ber of clusters increase. In fact this error introduced by clustering 
vanishes when the number of clusters is equal to the number of 
sources (each cluster contains only one source). Therefore, given a 
set of sources, the variance of r^ip,} will decrease as the number 
of clusters increase. 

Using Eq. ( I34t and bearing in mind that E{Npq} = 0, we 

can consider Hp, ~ CA/'(0, 5^X4) where E{np5n^} — a^l4. 
Therefore, 

y^CU{s,a'U), 
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Figure 5. Clustered and un-clustered calibrations CRLB. When the effec- 
tive noise power of the clustered calibration, | |N| p, is small enough, then 
its CRLB is lower than of the un-clustered calibration's and it reveals a 
supeiior performance. 



s = J*i ® Jpivec(Ci{pg} + C2{pq}), 
and similar to Eq. ( 129) , we have 

Var(e') > [25~^ lRe(J#J5)]"\ (35) 

We use numerical simulations to compare the un-clustered and 
clustered calibrations performances via their CRLBs which are 
given by Eq. l l29t and Eq. l l35t . respectively. 



4.1.2 Example 2: CRLB for two sources and one cluster 

We simulate a twelve hour observation of two point sources with 
intensities Ji = 11.25 and I2 = 2.01 Jy at sky coordinates {l,m) 
equal to (-0.014, -0.005) and (-0.011, -0.010) radians, respec- 
tively. We use the uv-coverage of Westerbork Synthesis Radio Tele- 
scope (WSRT) with 14 receivers in this simulation. 
We Consider the J Jones matrices in Eq. J20t to be diagonal. 
Their amplitude and phase elements follow W(0.75, 0.95) and 
W(0.003, 0.004) distributions, respectively. The background noise 
is N ~ CN{0, 101). Jones matrices of the clustered calibration, 
Jpi for j3 = 1,2, are obtained as Jpi + W(0.02, 0.40)e^"'° '''^' . 
For 20 realizations of J matrices, we calculated CRLB of the un- 
clustered and clustered calibrations using Eq. ( 129 ) and Eq. l |35t , 
respectively. The results are presented in Fig. |5] As shown in this 
figure, for small enough errors matrices F of Eq. ( I3U , the clus- 
tered calibration's CRLB stands below the un-clustered calibra- 
tion's CRLB. On the other hand, with increasing power of error ma- 
trices, or the power of effective noise N, the un-clustered calibra- 
tion's CRLB becomes lower than the clustered calibration's CRLB. 



4.1.3 Analysis of CRLB 

Generally, if source 1 is considerably brighter than source 2, 
||Ci{pq}|| S> ||C2{pq}||, and if the weak source power is much 



lower than the noise level, IIC 



2{p>j}l 



« !!Np 



then clustered 



calibration's performance is better than un-clustered calibration. 
Note that the worst performance of both calibrations is at the 
faintest source and we are more concerned to compare the CRLBs 
for this source. 



The CRLBs obtained for the un-clustered and clustered cal- 
ibrations in Eq. ( 129) and Eq. ( I35t , respectively, are both almost 
equal to the inverse of the Signal to Interference plus Noise Ratio 
(SINR), SINR~^. In un-clustered calibration, the effective signal 
for the faintest source is C2{pq} where the noise is Np,. There- 
fore, SINR for this source is 



SINR2 = 



^2{pq]\ 



IN, 



(36) 



But, in clustered calibration, the effective signal and noise are 



Ci 



{pi} 



+ C 



for the cluster is 



2{p<j} 



SINRc 



and Npq, respectively. Thus, SINR 



\^Vq\? 



Clustered calibration has an improved performance when 
SINRc > SINR2. 



(37) 



(38) 



Consider the two possible extremes in a clustered calibration 
procedure: 

(i) Clustering many sources in a large field of view to a very 
small number of clusters. In this case, the angular diameter of a 
cluster is probably too large for the assumption of uniform corrup- 
tions to apply. Subsequently, dedicating a single solution to all the 
sources of every cluster by clustered calibration introduces cluster- 
ing error matrices T with a large variance (see Eq. (131)). Having 
high inteiference power, the clustered calibration effective noise 
N of Eq. ([32} becomes very large. Therefore, clustered calibration 
SINR will be very low and it does not produce high quality results. 

(ii) Clustering sources in a small field of view to a very large 
number of clusters. In this case, the variance of V matrices are al- 
most zero while the signal powers of source clusters are almost as 
low as of the individual sources. Therefore, the SINR of clustered 
calibration is almost equal to the un-clustered calibration SINR and 
the calibration performance is expected to be almost the same as 
well. 

Thus, the best efficiency of clustered calibration is obtained at the 
smallest number of clusters for which Eq. (138) is satisfied. We use 
the SINR of Eq. j38) as an efficient criterion for detecting the opti- 
mum number of clusters. 



4.1.4 Generalization to many sources and many clusters 

For the visibility vector y of un-clustered calibration's general data 
model, presented by Eq. (|6}, we have 



y~CAA(^s.(e),n). 



(39) 



Therefore, the CRLB of un-clustered calibration is 

K K 



var(e) > 



2fRe (^J..(e))^n-^(^J.,(e)) 



where Js . is the Jacobian matrix of Si with respect to 0. 

Computing the exact CRLB is more complicated when we 
have clustered calibration. In the clustered calibration measurement 
equation, given by Eq. ( 113) , we have 



(40) 
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where n is the un-clustered cahbration's noise vector, 

r, = [vec(r,{i2})^ . . . vec(ri{(jv-i)iv})^l^, (41) 

and r^ipq} is given by Eq. iiW . Due to the existence of the nuisance 
parameters Ti in the clustered cahbration data model, calculation 
of its conventional CRLB is impractical. This leads us to the use of 
Cramer- Rao like bounds devised in the presence of the nuisance pa- 
rameters (Gini & Reggiannini'2000'). We apply the Modified CRLB 
(MCRLB) ( Gini et al. 1998) to the performance of clustered cali- 
bration. ^ 

The MCRLB for estimating the errors of in the presence of 
the nuisance parameters F (clustering error) is defined as 



var(e) ^ 



E 



y.r 



^4rin{P(y|r;0)} 



(42) 

where P{y\T\0) is the Probability Density Function (PDF) of the 
visibility vector y assuming that the F matrices of Eq. ( I41t are a 
priori known. Since n ~ CN{0, H), from Eq. J13t we have 



(43) 



x10" 



CO 

o 
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Figure 6. Clustered calibration's MCRLB. For very small Q, where the 
effect of interference is large, MCRLB is high. By increasing the number 
of clusters, MCRLB decreases and reaches its minimum where the best 
perfomiance of the clustered calibration is expected. After that, due to the 
dominant effect of the background noise, MCRLB starts to increase until it 
reaches the un-clustered calibration CRLB. 



and therefore in Eq. (1421 . — Ey|r 
is called the modified Fisher information, is equal to 



jlJ,ln{P(y!r;^)} 



which 



i — 1 i— 1 z — 1 i — 1 

Note that Ey_r in Eq. ( 142b could be estimated by Monte-Carlo 
method. 

As a rule of thumb, reducing the heavy computational cost of 
MCRLB, one can interpret the SINR test of Eq. {SU as follows: If 
in average the effective SINR of clustered calibration, SINRc, gets 
higher than the effective SINR of un-clustered calibration obtained 
for the weakest observed signal, SINR„, 



E{SINRc} > E{SINR„}, 



(44) 



then clustered calibration can achieve a better results. In Eq. ( i44b . 
the expectation is taken with respect to the noise N, error matrices 
F, and all the baselines. 

4.1.5 Example 3: MCRLB and SINR estimations 

We simulate WSRT including = 14 receivers which observe 
fifty sources with intensities below 15 Jy. The source positions 
and their brightness follow uniform and Rayleigh distributions, re- 
spectively. The background noise is N ~ CA/'(0, 151^/), where 
M = 2N{N — 1) = 364. We cluster sources using divisive 
hierarchical clustering, into Q number of clusters where Q G 
{3, 4, . . . , 50}. Clustered calibration's Jones matrices, J, are gener- 
ated as W (0.9, 1.1) e-''^'''''^'^' . Since for smaller number of clusters, 
we expect larger interference (errors) in clustered calibration's so- 
lutions, for every Q, we consider J^fii ~ CA/'(0, ^Im). The 
choice of the complex Gaussian distribution for the error matrices 
F is due to the central limit theorem and the assumptions made in 
section B.l.ll 

We proceed to calculate the clustered calibration's MCRLB, 
given by Eq. (|42ll, and E{SINRc}, utilizing the Monte-Carlo 
method. Jacobian matrices for MCRLB are calculated numerically 
and in computation of E{SINRc}, signal power of every cluster is 
obtained only using the cluster's brightest and faintest sources. The 
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Figure 7. Clustered calibration's SINR. SINR is low for small Q, when 
the interference is large. By increasing the number of clusters the SINR 
increases and gets its highest level for which the best performance of the 
calibration is expected. After that, it decreases due to the dominant effect of 
the background noise, and converges to the un-clustered calibration SINR. 



estimated results of MCRLB and E{SINRc} are presented by Fig. 
[6] and Fig. [T] respectively. As we can see in Fig.|6l for very small 
Q, where the effect of interference is large, MCRLB is high. By in- 
creasing the number of clusters, MCRLB decreases and reaches its 
minimum where the best performance of the clustered calibration is 
expected. After that, due to the dominant effect of the background 
noise, MCRLB starts to increase until it reaches the CRLB of un- 
clustered calibration. The same result is derived from E{SINRc} 
plot of Fig.|7] As Fig.|7]shows, E{SINRc} is low for very small Q, 
when the interference (i.e., the error due to clustering) is large. By 
increasing the number of clusters, E{SINRc} increases and gets its 
peak for which the clustered calibration performs the best. After 
that, it decreases and converges to the E{SINR} of un-clustered 
calibration. 
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4.2 Computational cost 

In the measurement equation of un-clustered calibration, presented 
in Eq. {SJ, we iiave M = 2N{N — 1) constraints given by the 
visibility vector y, and need to solve for P = iKN unknown 
parameters O.lf P > M, then Eq. ^ will be an under-determined 
non-linear system. This clarifies the need of having a small enough 
A'^ (number of antennas) and a large enough K (number of sources) 
for estimating 0. However, clustered calibration, Eq. ( 113) . has the 
advantage of decreasing the number of directions, K, relative to the 
number of source clusters, Q <^ K. This considerably cuts down 
the number of unknown parameters P that needs to be calibrated, 
thus reducing the computational cost of calibration. 



5 SELECTION OF NUMBER OF CLUSTERS 

Consider a clustered calibration procedure with a predefined clus- 
tering scheme. There is no guarantee that the calibration results 
for Q number of clusters, where Q £ {1, 2, . . . , K} is randomly 
chosen, is the most accurate. Thus, we seek the optimum number 
of clusters at which the clustered calibration performs the best. In 
this section, we describe th e use of: (i) Akaike's Information Cri- 
terion ( AIC) jAkaikdl 19731) , as well as (ii) Likelihood-Ratio Test 
(LRT) jGravesf 19781) . in finding this optimum Q for a given ob- 
servation. Some other alternative criteria could also be found in 



IWax & Kailath (1985). Note that for different clustering schemes 
the optimum Q is not necessarily the same. 



5.1 Akaike's Information Criterion (AIC) 

We utilize Akaike's Infomation Criterion (AIC) to find the opti- 
mum Q for clustered calibration. 

Consider having n ~ CA/'(0, (t^Im) in the general data model of 
clustered calibration, Eq. il3) . Then, the log-likelihood of the visi- 
bility vector y is given by 

£.(e\y) = -A/ log TT - log 5^ 

, Q Q 



-(y-^s,(0))^(y-^s,W). 



(45) 



i=l j=l 

The maximum likelihood estimation of the noise variance is 

Q Q 



(46) 



i = l 



Substituting Eq. l l46t in Eq. l l45t . we arrive at the maximum likeli- 
hood estimation of 6, 

£.(0\y) = -AflogTT-Af 

- A^Iog{^(y - J2^m)"{y - J2^^m}■ (47) 
Using Eq. ( 147 1 . the AIC is given by 

AIC((3) = -2Z:(i'|y) + 2(2P). (48) 
The optimum Q is selected as the one that minimizes A1C{Q). 

5.2 Likeliliood-Ratio Test (LRT) 

Errors in clustered calibration originate from the system (sky and 
instrumental) noise, "clustering eiTors" introduced in section l4.I.II 



and "solver noise" which is refeiTed to as errors produced by the 
calibration algorithm itself. We assume that the true Jones matrices 
along different directions (clusters) at the same antenna are sta- 
tistically uncorrelated. Note that this assumption is only made for 
the LRT to produce reasonable results and this assumption is not 
needed for clustered calibration to work. Therefore, if such corre- 
lations exist, they are caused by the aforementioned eiTors. Con- 
sequently, the more accurate the clustered calibration solutions are, 
the smaller their statistical similarities would be. Based on this gen- 
eral statement, the best number of clusters in a clustered calibration 
procedure is the one which provides us with the minimum correla- 
tions in the calibrated solutions. Note that for a fixed measurement, 
the correlation due to the system noise is fixed. Therefore, differ- 
ences in the statistical similarities of solutions obtained by different 
clustering schemes are only due to "clustering errors" and "solver 
noise". 

To investigate the statistical interaction between the gain solu- 
tions we apply the Likelihood-Ratio Test (LRT). 
Consider the clustered calibration solution J pi (6) for directions i, 
i € {1, 2, . . . , Q}, at antennas p, where p £ {1,2,..., A'^}, 



J21,p 



Jl2,p 
J22,p 



(49) 



Then, the parameter vector Opi (corresponding to the i-th direction 
and p-th antenna) is obtained by 

0p, = [5Rc(Jii,p) 3m(Jii,p) . . . lRe(J22,p) 3m(J22,p)]f . (50) 

Let us define for each antenna p and each pair of directions k and 
I, where k and / are belong to {1, 2, Q}, a vector Zpki as 



{■pkl 



- [Opk Opi] 



(51) 



In fact, we are concatenating the solutions of the same antenna for 
two different directions (clusters) together. 
We define the null Hq model as 



Ho 



tpkl 



' AA(m,Eo). 



where 



and 



[m{e 



pk 



So 



s^iepk) 









pi) 



(52) 



(53) 



(54) 



In Eq. i53l and Eq. ( I54t . m and are denoting sample mean and 
sample variance, respectively. Note that having a large number of 
samples in hand, the assumption of having a Gaussian distribution 
for solutions is justified according to the Central Limit theorem. 
The structure of the variance matrix Eo tells us that the statistical 
correlation between the components of the random vector Zqki, or 
between the solutions 6pk and 6pi, is zero. This is the ideal case in 
which there are no estimation errors. 

To investigate the validity of the null model compared with the 
case in which there exists some correlation between the solutions, 
we define the alternative Hi model as 



Hi : Zpki ~ A/'(m, El) 
where the variance matrix Ei is given by 



El 



s^{0pk) Cov{dpk,Opi) 
Cow{0pk,0pif 



(55) 



(56) 
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Cow{6pk,6pi) in Eq. l l56t is the 8x8 sample covariance matrix. 
Using tiie above models, the Likelihood-Ratio is defined as 

. „, / Likelihood for null model \ 

A = -21n , (57) 

\ Likelihood for alternative model / 

which has a distribution with 64 degrees of freedom. As A be- 
comes smaller, the null model, in which the statistical correlation 
between the solutions is zero, becomes more acceptable rather than 
the alternative model. Therefore, the smaller the A is, the less the 
clustered calibration's eiTors are, and vice-versa. 



6 SIMULATION STUDIES 

We use simulations to compare the performance of un-clustered 
and clustered calibration. Working with simulations has the advan- 
tage of having the true solutions available, which is not the case in 
real observations. That makes the comparison much more objec- 
tive. Nevertheless, the better performance of the clustered calibra- 
tion in comparison with the un-clustered on es in calibrating for rea l 
observations of LOFAR is also shown by iKazemi et aljTioilal '): 
lYatawatta et alj ( [submitted 2012h . 

We simulate an East- West radio synthesis array including 14 
antennas (similar to WSRT) and an 8 by 8 degrees sky with fifty 
sources with low intensities, below 3 Jy. The source positions and 
their brightness follow uniform and Rayleigh distributions, respec- 
tively. The single channel simulated observation at 355 MHz is 
shown in Fig. [8] 

We proceed to add gain errors, multiplying source coherencies 
by the Jones matrices, as it is shown in Eq. ([2j, to our simulation. 
The amplitude and phase of the Jones matrices' elements are gener- 
ated using linear combination of sine and cosine functions. We aim 
at simulating a sky with almost uniform variations on small angular 
scales. In other words, we provide very similar Jones matrices for 
sources with small angular separations. To accomplish this goal, for 
every antenna, we first choose a single direction as a reference and 
simulate its Jones matrix as it is explained before. Then, for the 
remaining forty nine sources, at that antenna, the Jones matrices 
(amplitude and phase terms) are that initial Jones matrix multiplied 
by the inverse of their corresponding angular distances from that 
reference direction. The result of adding such gain errors to our 
simulation is shown in Fig.|9] 



6.1 Performance comparison of the Clustered and 
un-clustered calibrations at SNR=2 

We add noise n ~ CAf{0, a^I) with — 28, as it is shown in Eq. 
(|6), to our simulation. The result has a SNR — 2 and is presented 
in Fig. [To] We have chosen to present the case of SNR — 2 since 
for this particular simulated observation both the divisive hierarchi- 
cal and the weighted K-means clustered calibrations achieve their 
best performances at the same number of clusters, as will be shown 
later in this section. 

We apply un-clustered and clustered calibration on the simu- 
lation to compare their efficiencies. The fifty sources are grouped 
into Q G {3, 4, . . . , 49} number of clusters, using the proposed 
divisive hierarchical and weighted K-means clustering algorithms. 
Self-calibration is implemented via Space Altern ating General- 
ized Expectation Maximization (SAGE) algorithm dFessler & Herd 



[1994 [Yatawatta et alJlioO^ : [Kazemi et alj[201 Ibl) with nine itera- 
tions. Plots of the averaged Frobenius distance between the simu- 
lated (true) Jones matrices and the obtained solutions is shown in 
Fig-En As we can see in Fig.[TT] for both clustering schemes, in- 
creasing the number of clusters decreases this distance and the min- 
imum is reached at approximately thirty three clusters (Q = 33). 
Beyond this number of clusters, it increases until the fifty individual 
sources become individual clusters. This shows that the best perfor- 
mance of both the divisive hierarchical and the weighted K-means 
clustered calibrations is approximately at thirty three clusters and 
is superior to that of the un-clustered calibration. 
The Frobenius distance curves in Fig. [TT] the MCRLB curve in 
Fig.[6| and the SINR curve in Fig.[7]illustrate that clustered calibra- 
tion with an extremely low number of clusters does not necessarily 
perform better than the un-clustered calibration. The reason is that 
when there are only a small number of clusters, the interference, or 
the so-called "clustering errors" introduced in section [4. 1.11 is rela- 
tively large. Therefore, the effect of this interference dominates the 
clustering of signals. On the other hand, we reach the theoretical 
performance limit approximately after twenty five number of clus- 
ters and therefore increasing the number beyond this point gives 
highly variable results, mainly because we are limited by the num- 
ber of constraints as opposed to the number of parameters that we 
need to solve for. But, this is not the case for the plots in Fig.|6|and 
Fig.[7| The reason is that the MCRLB results of Fig.[6]as well as the 
SINR results of Fig. [7] are obtained by Monte-Carlo method with 
iterations over fifty different sky and noise realizations. However, 
Fig.[TT]is limited to the presented specific simulation with only one 
sky and one noise realization. 

The residual images of the un-clustered calibration as well as 
the divisive hierarchical and weighted K-means clustered calibra- 
tions for Q = 33 are shown by Fig. [T2] and Fig. [13] respectively. 
As it is shown by Fig. [T2] in the result of un-clustered calibration, 
the sources are almost not subtracted at all and there is a signifi- 
cant residual error remaining. The residuals have asymmetric Gaus- 
sian distribution with variance = 82.29 which is much larger 
than the simulated (true) noise variance — 10.85. On the other 
hand, the sources have been perfectly subtracted in the case of clus- 
tered calibration. Fig. [13] and the residuals converge to the simu- 
lated background noise distribution. The residuals of the divisive 
hierarchical and Weighted K-means clustered calibrations follow 
symmetric zero mean Gaussian distributions with cr^ = 20.17 and 
— 18.76, respectively. These variances are closer to the simu- 
lated one = 10.85 and this indicates the promising performance 
of clustered calibration. As we can see, hierarchical clustered cal- 
ibration provides a slightly better result compared to the K-means 
one. This is due to the fact that hierarchical clustering constructs 
clusters of smaller angular diameters and thus it assigns the same 
calibration solutions to the sources with smaller angular separa- 
tions. 

We also calculate the Root Mean Squared Ertor of Prediction 
(RMSEP) to assess the performance of clustered and un-clustered 
calibrations' non-linear regressions. The results of log(RMSEP), 
presented by Fig. [14] also justify that the best efficiencies of the hi- 
erarchical and K-means clustered calibrations are obtained at thirty 
three number of clusters. But, note that there is a difference be- 
tween the behavior of log(RMSEP) plot of Fig. [14] and the plots 
of MCRLB, SINR, and Frobenius distance between the simulated 
Jones matrices and solutions in Fig. [6] Fig. [7] and Fig.[TT] respec- 
tively. In Fig. [14] log(RMSEP) of clustered calibration is less than 
that of un-clustered calibration, even for extremely low number of 
clusters. This means that even with those low number of clusters. 
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Figure 9. Simulated image witli added gain en'ors. The errors, tlie complex 2x2 Jones matrices, are generated as linear combinations of sin and cos functions. 
The vaiiation of the sky is almost uniform on small angular scales. 
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Figure 10. Simulated image of sky, corrupted by gain errors and by noise, as in Eq. (6). Tlie simulated noise vector n has zero mean complex Gaussian 
distribution and the SNR is equal to 2. 




Figure 12. Residual image of the un-clustered SAGE calibration for fifty sources. The sources are almost not subtracted at all and there are significant residual 
errors around them. The residuals have asymmetiic Gaussian distribution with vaiiance = 82.29 which is much larger than the trae noise variance 
0-2 = 10.85. 
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Figure 13. The residual images of the clustered calibration using hierarchical (right) and Weighted K-means (left) clustering methods with thirty three 
source clusters. Calibration is implemented by SAGE algorithm. The sources are subtracted perfectly and the residuals converge to the simulated background 
noise distribution. The hierarchical clustering and Weighted K-means residuals follow symmetric zero mean Gaussian distributions with cr^ = 20.17 and 
= 18.76, respectively, where the simulated noise distribution is, CN{Q, 10.851). 




10 20 30 40 50 



Number of Clusters 

Figure 11. The average Frobenius distance between the simulated (true) 
Jones matrices and the solutions of clustered and un-clustered calibrations. 
The two curves represent clustered cahbration via divisive hierarchical and 
weighted K-means clustering algorithms. By increasing the number of clus- 
ters, for both clustering methods, this distance is decreased and gets its min- 
imum approximately at thirty three clusters. After that, it is increased till the 
fifty individual sources. That shows that the best performance of the clus- 
tered calibration is at around thirty three clusters. 



clustered calibration performs better than the un-clustered calibra- 
tion. This is somewhat in disagreement with the scenarios shown in 
Fig-IH Fig.l?] and Fig.[TT] For a better understanding of the reason 
behind this contrast, first lets see how the residual errors are origi- 
nated. 

Based on Eq. l ll3t and Eq. l l40t . in the clustered calibration strategy, 



we have 

Q _ X 
y = £i,W + ^r. + n, (58) 

1=1 i=l 

After executing a calibration for the above data model, there is a 
distance between the target parameters 6 and the estimated solu- 
tions 0. This is the so-called "solver noise", mentioned in section 
15.21 Thus, the residuals are given as 

Q ^ Q ^ ^ 

y-j2M0) = £{s.(6l) -i,(6l)} + + n. (59) 

i—l i—1 i=l 

From Eq. i59\ . we immediately see that the background noise n is 
fixed and the "clustering errors" are calculated for all the sources 
as 

T,'tLi'^^- However, since we solve only for Q directions and 
not for all the K sources individually, the "solver noise" part, 

Ylf:=i{^i{^) is ''Iso calculated only for Q clusters and 

not for all the K sources. It is clear that for a very small Q, this 
term could be much less than for Q ~ K. Therefore, in Fig.[TT] the 
result of RMSEP at a very low number of clusters is still less than 
the ones at Q — K. When Q is not at the two extremes of being 
very small or very large (almost equal to the number of individual 
sources), then the result of RMSEP is promising. Moreover, apply- 
ing more accurate calibration methods or increasing the number of 
iterations, the "solver noise" will decrease and subsequently, we 
expect the RMSEP curve to have the same behavior as the curves 
of Fig.|7]andFig.[n] 
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Figure 14. The RMSEP for clustered and un-clustered calibrations. The re- 
sults are obtained using a base ten logarithmic scale. The two curves are cor- 
responding to clustered calibration via divisive hierarchical and weighted 
K-means clustering algorithms. By increasing the number of clusters, the 
results are decreased and the minimum result is obtained at around thirty 
three clusters. After that, the results are increased till the fifty individual 
sources. That shows the superior performance of the clustered calibration 
compared to the un-clustered one. The best performance of the clustered 
calibration for both of the applied clustering methods is at around thirty 
three number of clusters. 



6.2 Optimum number of clusters for SNR=2 

We utilize AIC and LRT to select the optimum number of clus- 
ters for which clustered cahbration achieves its best performance. 
The methods are applied to our simulation for the case of SNR=2. 
The AIC and LRT results are shown in Fig. [15] and Fig. [T6] re- 
spectively. They both agree on Q = 33 as the optimum number of 
clusters for the divisive hierarchical and Weighted K-means clus- 
tered calibrations. Likelihood-Ratio plot of Fig. [T6]has almost the 
same behavior as the plot of Frobenius distance between the sim- 
ulated Jones matrices and the obtained solutions presented by Fig. 
nn The reason is that the results of the both plots are obtained using 
the solutions themselves as the input data. However, since AIC re- 
sults are computed using the residual eiTors as inputs, which is also 
the case for obtaining the RMSEP curves of Fig. [141 AIC curves of 
Fig.[T5]are slightly steeper than the Frobenius distance between the 
simulated Jones matrices and solutions and the Likelihood-Ratio 
curves of Fig.[TT]and Fig. [16] respectively. 



6.3 Clustered calibration's efficiency at different SNRs 

We start changing the noise in our simulation to see how it ef- 
fects the clustered calibration's efficiency. We simulate the cases 
for which SNRG {1,2,...,15} and apply clustered calibration on 
them. Since the sky model does not change, the clusters obtained by 
divisive hierarchical and weighted K-means methods for the case of 
SNR= 2 remain the same. Fig. [IT] shows the optimum number of 
clusters, on which the best performances of clustered calibrations 
are obtained for those different SNRs. As we can see in Fig. [17] 
for low SNRs, the optimum Q is small. By increasing the SNR, the 
optimum Q is increased till it becomes equal to the number of all 
the individual sources that we have in the sky, i.e., K. This means 
that when the SNR is very low, the benefit of improving signals 
by clustering sources is much higher than the payoff of introducing 
"clustering errors" in a clustered calibration procedure. Therefore, 
clustered calibration for Q <^ K has a superior performance com- 
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Figure 15. AIC plot for clustered and un-clustered cahbrations. Both the 
weighted K-means and divisive hierarchical clustered calibrations get their 
minimum AIC at about thirty three clusters. This illustrates that their best 
perfomiances are obtained at this number of clusters. Also, their AIC re- 
sults at thirty three clusters is lower than the un-clustered calibration's AIC, 
which shows their better performances compared to the un-clustered cali- 
bration. 



ISOOr 



•Weighted K-means 
Divisive Hierarchical 
—un-clustered 




20 30 
Number of clusters 

Figure 16. Likelihood-ratio of the gain solutions obtained by clustered and 
un-clustered calibrations. In the both cases of weighted K-means and divi- 
sive hierarchical clustered calibrations, the minimum Likelihood-ratio val- 
ues belong to approximately thuty three number of clusters . These min- 
imums are also lower than the un-clustered calibration's Likelihood-ratio 
result. Therefore, clustered calibration via both the clustering methods per- 
forms better than the un-clustered calibration and it achieves the best accu- 
racy in its solutions at thirty three clusters. 



pared to the un-clustered calibration. While for a high enough SNR, 
the situation becomes the opposite. In this case, the un-clustered 
calibration achieves better results compared to clustered calibration 
having the disadvantage of introducing "clustering eiTors". 



6. 3. 1 Empirical estimation of SINR 

Having the results of Fig. [17] in hand, we could find an empirical 
model for estimating the optimum number of clusters for various 
SNRs. Note that by changing the observation and the instrument 
characteristics, this model will also be changed. 

As it is explained in section 14.1.31 the best performance of 
clustered calibration is obtained when the SINR is at its highest 
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Figure 17. The optimum number of clusters, on which the best performance 
of clustered calibration is obtained, at different SNRs. For low SNRs, the 
efficiency of the clustered calibration is superior to the un-clustered cali- 
bration. As the SNR gets higher, clustered calibrations achieve their best 
solutions utilizing a higher number of clusters. Finally, when the SNR is 
high enough, the performance of un-clustered calibration becomes better 
than the clustered one. 



level. Fig.[T7]shows the number of clusters on which the maximum 
SINR was obtained, where the signal (sky) and the noise powers are 
known a priori. Thus, the only unknown for estimating the SINR 
is the interference, or the "clustering errors", for which we need 
to have a prediction model. After estimating the SINR using this 
model, finding the optimum Q will be straightforward. 

Consider the definition of "clustering errors" given by Eq. 
| |3U . It is logical that for every source, the difference between its 
true Jones matrix and the clustered calibration solution, 1 1 J — J| |, is 
a function of the angular distance between the source and the cen- 
troid of the cluster that it belongs to. Based on this and using Eq. 
( I3U and Eq. l |34t . for the interference of the i-th cluster at baseline 
p — qwe assume that 



leLi 



'CA/'(0,r7||C 



i{pq} I 



(60) 



where rj and f are unknowns. Eq. ( |60t , in fact, considers an inter- 
ference power (variance) of r]\\Cnpqy\\'^ {D{Li)}'' for every i-th 
cluster, i G {1, 2, Q}, at baseline p — g. Assuming the interfer- 
ences of different clusters to be statistically independent from each 
other, and bearing in mind that the baseline's additive noise Np, 
has also a complex Gaussian distribution independent from those 
interferences', then the noise plus interference power for the i-th 
cluster at baseline p — g is obtained by 



(61) 



Fitting suitable rj and to Eq. l |61i l, the SINR for the i-th cluster at 
baseline p — g is equal to the cluster's signal power, ||Ci{pq}|p, 
divided by the result of Eq. ( I61t . Subsequently, estimation of 
E{SINRc} will be straight forward where the expectation is cal- 
culated with respect to all the source clusters and all the base- 
lines. Note that simulation provides us with the true noise power, 
||Np,|p.Inthe case of having a real observation, this power could 
be estimated by Eq. l |46t . 

Fig. [TS] shows the number of clusters on which divisive hierar- 
chical and weighted K-means clustered calibrations achieve their 
maximum estimated E{SINRc}. The results are calculated for 
SNRG {1, 2, . . . , 15}. For the hierarchical clustering rj = 1550 
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Figure 18. Optimum number of clusters at which the divisive hierarchi- 
cal (top) and Weighted K-means (bottom) clustered calibrations perform 
the best. For both of the clustering methods, the results obtained by SINR 
estimations mostly match the true optimum number of clusters. 



and ly — 0.3, and for the K-means clustering rj = 2500 and 
v = 0.003. As we can see, for both the clustering methods, these 
maximum E{SINRc}s are mostly obtained at the true optimum 
number of clusters for which clustered calibration performed the 
best. Introducing more refined models compared to Eq. ( 160) could 
even improve the current result. 



6.4 Realistic sky models 

So far, we have limited our studies to sky models in which the 
brightness and position of the radio sources follow Rayleigh and 
uniform distributions, respectively. These characteristics provide us 
with a smooth and uniform variation of flux intensities in our sim- 
ulated skies. In such a case, the effects of the background noise on 
the faintest and the strongest signals are almost the same. There- 
fore, if clustered calibration performs better than the un-clustered 
calibration that would be only based on upgrading the signals 
against the noise. Although, in nature, we mostly deal with the sky 
models in which the distribution of the flux intensities is a power 
law, with a steep slope, and the spatial distribution is Poisson. 
Hence, there exist a few number of very bright sources, whose sig- 
nals are considerably stronger than the others, and they are sparse in 
the field of view. The corruptions of the background noise plus the 
interferences of the strong signals of those few bright sources make 
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the calibration of the other faint point sources impractical. Thus, 
there is the need for utilizing the clustered calibration which applies 
the solutions of the bright sources to their closed by fainter ones or 
solves for upgraded signals obtained by addin g up a group of fain t 
signals together . This has been shown by Ka zemi et alj j20l"T^ : 
lYatawatta et al.l ( [submitted 2012) . when comparing the efficiency 
of the clustered and un-clustered calibrations on LOFAR real ob- 
servations. In this section, using simulations, we also reveal the 
superiority of clustered calibration compared to the un-clustered 
calibration for such sky models. 

We sim ulate a sky of 52 ra dio point sources which are obtained 
by modified Ijelic et al.l ( l2008h foreground model. The brightness 
distribution of the poi nt sources follows the source count function 
obtained at 151 MHz jWillott e t al. 2001), while the angular clus- 
tering of the sources are characterized by a typical two-point corre- 
lation function. 



p{d) = Ad.- 



(62) 



In J62b . p is the two point correlation function, d is the angular sep- 
aration, and A is the normalization amplitude of p. The flux cut off 
is 0.1 Jy. 

We corrupt the signals with gain errors which are linear combina- 
tions of sin and cos functions, as in our previous simulations. At 
the end, a zero mean Gaussian thermal noise with a standard devi- 
ation of 3 mjy is added to the simulated data. The result is shown 
in Fig. [19] In Fig.[T9l all the bright sources are concentrated on the 
right side of the image, rather than being uniformly distributed in 
the field of view, and the rest of the sources are so faint that are 
almost invisible. 

We apply the clustered and un-clustered calibrations on Q G 
{3, 4, . . . 51} number of clusters and K — 52 number of individ- 
ual sources, respectively. The clustering method used is the divi- 
sive hierarchical and the calibrations are executed via SAGE al- 
gorithm with nine number of iterations. The residual noise vari- 
ances obtained are demonstrated in Fig. |20] As Fig.|20]shows, the 
level of the residual noise obtained by the clustered calibration for 
Q £ {15, 16, . . . 45} number of clusters is always below the result 
of the un-clustered calibration. This proves the better performance 
of the clustered calibration. The best result of the clustered calibra- 
tion, with the minimum noise level, is achieved for Q — 27 number 
of clusters. 

The residual images of the clustered calibration with Q = 27 
number of source clusters, and the un-clustered calibration for 
K = 52 individual sources are shown by Fig.|2T| In the right side 
of the residual image of the un-clustered calibration there exist ar- 
tificial stripes caused by over and under estimating the brightest 
sources of the field of view. That shows the problematic perfor- 
mance of the un-clustered calibration. However, clustered calibra- 
tion has generated much less artificial effects after subtracting these 
sources. On top of that, the zoomed in window in the left side of 
the images of Fig. |2T] show that the faint sources are not removed 
by the un-clustered calibration at all, while being almost perfectly 
subtracted by the clustered calibration. Moreover, the residual noise 
of the clustered calibration follows a symmetric zero mean Gaus- 
sian distributions with a standard deviation of 4.2 mJy, while the 
one from the un-clustered calibration has an asymmetric Gaussian 
distribution with mean and standard deviation equal to -1.2 and 5.3 
mJy, respectively. Taking to account that the simulated noise dis- 
tribution is a zero mean Gaussian distribution with a variance of 
3 mJy, the superior performance of the clustered calibration com- 
pared to the un-clustered one is evident. 

As the final conclusion of this simulation, calibrating below 
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Figure 20. The noise variances of the residual images, obtained by clus- 
tered and un-clustered calibrations, in mJy. The level of the residual noise 
obtained by the clustered cahbration for Q G {15, 16, . . . 45} number of 
clusters stands below the result of the un-clustered calibration. That reveals 
the superior performance of the clustered calibration in comparison with the 
un-clustered one. The best result of the clustered calibration which achieves 
the minimum noise level is at Q = 27 number of clusters. 



the noise level, clustered calibration always performs better than 
the un-clustered calibration. This is regardless of the sky model and 
is only based on the fact that solving for individual sources with 
very poor signals is impractical. Nevertheless, when some sources 
are very close to each other, the sky corruption on their signals 
would be exactly the same and there is no point in solving for every 
of them individually. 



7 CONCLUSIONS 

In this paper, we demonstrate the superior performance of "clus- 
tered calibration" compared to un-clustered calibration especially 
in calibrating sources that are below the calibration noise level. The 
superiority is in the sense of having more accurate results by the 
enhancement of SNR as well as by the improvement of computa- 
tional efficiency by reducing the number of directions along which 
calibration has to be performed. 

In a "clustered calibration" procedure, sky sources are 
grouped into some clusters and every cluster is calibrated as a sin- 
gle source. That replaces the coherencies of individual sources by 
the total coherency of the cluster. Clustered calibration is applied to 
these new coherencies that carry a higher level of information com- 
pared with the individual ones. Thus, for the calibration of sources 
below the noise level it has a considerably better performance com- 
pared to un-clustered calibration. Another way of looking at clus- 
tering is to consider the distribution of source flux densities, i.e. 
the number of sources vs. the source flux density curve. Regardless 
of the intrinsic flux density distribution, clustering makes the num- 
ber of clusters vs. the cluster flux density curve more uniform, thus 
yielding superior performance. An analytical proof of this superi- 
ority, for an arbitrary sky model, is presented using MCRLB and 
SINR analysis. 

KLD and LRT are utilized to detect the optimum number of 
clusters, for which the clustered calibration accomplishes its best 
performance. A model for estimating SINR of clustered calibration 
is also presented by which we could find the optimum number of 
clusters at low computational cost. 
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Figure 19. Simulated observation of fifty two sources wliicli are obtained by modified I jelic et aP i2008h foreground model. The corrupting gain errors are 
generated as linear combinations of sin and cos functions. The image size is 8 by 8 degrees and the additive thermal noise is a zero mean Gaussian noise with 
a standard deviation of 3 mjy. 



Divisive hierarchical as well as Weighted K-means clustering 
methods are used to exploit the spatial proximity of the sources. 
Simulation studies reveal clustered calibration's improved perfor- 
mance at a low SNR, utilizing these clustering algorithms. Both the 
clustering methods are hard clustering techniques which divide data 
to distinct clusters. However, we expect more accurate results using 
fuzzy (soft) clustering, which constructs overlapping clusters with 
uncertain boundaries. Application and performance of this type of 
clustering for clustered calibration will be explored in future work. 
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Figure 21. The residual images of tlie clustered calibration for Q = 27 number of clusters (right) and the un-clustered calibration for K = h2 (left). 
Calibration is implemented by SAGE algorithm with nine number of iterations and the clustering method applied is divisive hieraixhical. In the right side of 
the residual image of the un-clustered calibration we see stripes of over and under estimations due to problematic performance of the un-clustered calibration 
in subtracting the brightest sources. The zoomed in window in the left side of the image also shows that the faint sources are not removed at all. However, 
clustered calibration could remove all the faint sources almost perfectly and has generated much less artefacts after subtracting for the brightest sources in the 
light side of the field of view. Moreover, the residual noise of the clustered calibration follows a symmetric zero mean Gaussian distributions with a standard 
deviation of 4.2 mjy, while the one from the un-clustered calibration has an asymmetric Gaussian distribution with mean and standard deviation equal to -1.2 
and 5.3 mJy, respectively. Taking to account that the simulated noise distribution is a zero mean Gaussian distribution with a standard deviation of 3 mJy, the 
better performance of the clustered calibration compared to the un-clustered one is evident. 
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